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Amendments to the Claims : 

This listing of claims will replace all prior versions, and listings, of claims in the 
application: 
Listing of Claims : 

1. (Currently Amended) A computer-implemented method of identifying 
whether a sequence is a semantic unit, the method comprising: 

calculating a first value representing a coherence of terms in the sequence; 
calculating a second value representing variation of context in which the sequence 
occurs; [[and]] 

determining whether the sequence is a semantic unit based at least in part on the 
first and second values ; and 

outputting an indication of whether the sequence is a semantic unit . 

2. (Original) The method of claim 1, wherein the coherence of the terms in 
the sequence is calculated relative to a collection of documents. 

3. (Original) The method of claim 2, wherein the coherence of the terms in 
the sequence is calculated as a likelihood ratio that defines a probability of the sequence 
occurring in the collection of documents relative to parts of the sequence occurring. 

4. (Original) The method of claim 2, wherein the coherence of the terms in 
the sequence is calculated as: 
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LR(A,B) = 



L(f(B),N) 



L{f{AB)J{A))-L{f{~ AB)J{~ A)) ' 



where f{A) defines a number of occurrences of term A in the collection of documents, 
f{~A) defines a number of occurrences of a term other than term A in the collection of 
documents, /(fi) defines a number of occurrences of term B in the collection of 
documents, defines a total number of events in the collection of documents, /(AS) 
defines a number of times term A is followed by term B in the collection of documents, 
and/(~Afi) is a number of times a term other than A is followed by term in the 
collection of documents, wherein 



5. (Original) The method of claim 1, wherein the coherence of the terms in 
the sequence are defined as not being sufficient unless a threshold is met. 

6. (Original) The method of claim 5, wherein the threshold is defined as: 



collection of documents, /(fi) defines a number of occurrences of term B in the collection 
of documents, N defines a total number of events in the collection of documents, and 
f{AB) defines a number of times term A is followed by term B in the collection of 
documents. 




f{AB) > 



fiA)-f(B) 
N 



where /(A) defines a number of occuiTences of term A in the 
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7. (Original) The method of claim 1, wherein the variation of context in 
which the sequence occurs is calculated relative to a collection of documents. 

8. (Original) The method of claim 7, wherein the variation of context in 
which the sequence occurs is calculated as a measure of entropy of the context of the 
sequence. 

9. (Original) The method of claim 7, wherein the variation of context in 
which the sequence occurs, H(S}, is calculated as 

HM{S) = MIN{HL(S),HR(S)), 



where MIN defines a minimum operation, S represents the sequence, f(wS) defines a 
number of times a particular term, w, appears in the collection of documents followed by 
the sequence, /(5wj refers to a number of times the sequence is followed by w in the 
collection of documents, and f(S) refers to a number of times the sequence S is present in 
the collection of documents. 

10. (Original) The method of claim 7, wherein the variation of context in 
which the sequence occurs, HM(S), is calculated as 
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HM{S) = MIN{HLM{S),HRM{S)), 
where MIN defines a minimum operation, HLMf^j is defined as a minimum of 

\-— for each term w in the collection of documents, HRM(S) is defined as a 

fiS) 

minimum of 1- ^^^^^ for each term w in the collection of documents, /fw^j defines a 
f(S) 

number of times a particular term, w, appears in the collection of documents followed by 
the sequence, /f^vfj refers to a number of times the sequence is followed by w in the 
collection of documents, and f(S) refers to a number of times the sequence is present in 
the collection of documents. 

1 1 . (Original) The method of claim 7, wherein the variation of context in 
which the sequence occurs, HQS), is calculated as 

HC(S) = MIN(HLCiS), HRCiS)) , 
where MIN defines a minimum operation, HLC(S) is defined as ^S(wS) and HRC(S) is 

defined as ^S(Sw) , where S(X) is defined as one if sequence X occurs in the 

collection of documents and zero otherwise, where wS refers to a particular word 
followed by the sequence, and where Sw refers to the sequence followed by a word. 

12. (Original) The method of claim 7, wherein the variation of context in 
which the sequence occurs, HP(S), is calculated as 

HP(S) = MIN(HLP(S), HRP(S)) 
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where MIN defines a minimum operation, HLP(S) is defined as tlie number of 

continuations to the left of the sequence that cover a predetermined percentage of all 

cases in the collection of documents and HRP(S) is defined as the number of 

continuations to the right of the sequence that cover the predetermined percentage of all 

cases in the collection of documents. 

13. (Original) The method of claim 1, wherein determining whether the 
sequence is a semantic unit includes comparing the first and second values to first and 
second thresholds and identifying the sequence as a semantic unit when the first and 
second values satisfy the first and second thresholds. 

14. (Original) The method of claim 1, wherein the sequence includes three or 
more words. 

15. (Original) The method of claim 1, further including: 
applying one or more rules to the sequence, and 

wherein determining whether the sequence is a semantic unit is farther based at 
least in part on the application of the one or more rules. 

16. (Currently Amended) A device comprising: 

a coherence component configured to calculate a coherence of multiple terms in a 
sequence of terms; 
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a variation component configured to calculate a variation of context terms in a 
collection of documents in which the sequence occurs; and 

a decision component configured to determine whether the sequence constitutes a 
semantic unit based at least in part on results of the coherence component and the 
variation componen t, and output an indication of whether the sequence constitutes a 
semantic unit . 

17. (Original) The device of claim 16, wherein the context terms include 
terms to the left and right of the sequence. 

18. (Original) The device of claim 16, wherein the coherence of the terms in 
the sequence is calculated relative to the collection of documents. 

19. (Original) The method of claim 18, wherein the coherence of the terms in 
the sequence is calculated as a likelihood ratio that defines a probability of the sequence 
occurring in the collection of documents relative to parts of the sequence occurring. 

20. (Original) The device of claim 16, wherein the variation of context in 
which the sequence occurs is calculated as a measure of entropy of the context of the 

sequence. 

21 . (Original) The device of claim 20, wherein the variation of context in 
which the sequence occurs, H(S),is calculated as 
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H(S) = MIN{HL(S),HR(S)), 

r f(s) \ f(s) ) 

and 

w f(S) \ f(S) ) 
where MIN defines a minimum operation, 5' represents the sequence, f(wS) defines a 
number of times a particular term, w, appears in the collection of documents followed by 
the sequence, /f^wj refers to a number of times the sequence is followed by w in the 
collection of documents, and f(S) refers to a number of times the sequence S is present in 
the collection of documents. 

22. (Original) The device of claim 20, wherein the variation of context in 
which the sequence occurs, HM(S), is calculated as 

HM{S)=MAX{HLM{S),HRM{S)), 
where MIN defines a minimum operation, HLM(S) is defined as a minimum of 

^_ f{wS) ^^^^ ^ collection of documents, HRM(S) is defined as a 

f(S) 

f(Sw) 

minimum of 1-- for each term w in the collection of documents, ffw^) defines a 

f(S) 

number of times a particular term, w, appears in the collection of documents followed by 
the sequence, /f^wj refers to a number of times the sequence is followed by w in the 
collection of documents, and f(S) refers to a number of times the sequence is present in 
the collection of documents. 
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23. (Original) The device of claim 20, wherein the variation of context in 
which the sequence occurs, HQS), is calculated as 

HC(S) = MIN(HLC(S), HRC(S)) , 
where MIN defines a minimum operation, HLC(S) is defined as ^S(wS) and HRC(S) is 

defined as ^S(Sw) , where S(X) is defined as one if sequence X occurs in the 

document collection and zero otherwise, where wS refers to a particular word followed 
by the sequence, and where Sw refers to the sequence followed by a word. 

24. (Original) The device of claim 20, wherein the variation of context in 
which the sequence occurs, HP(S), is calculated as 

HP(S) = MIN(HLP(S), HRP(S)) 
where MIN defines a minimum operation, HLP(S) is defined as the number of 
continuations to the left of the sequence that cover a predetermined percentage of all 
cases in the collection of documents and HRP(S) is defined as the number of 
continuations to the right of the sequence that cover the predetermined percentage of all 
cases in the collection of documents. 

25. (Currently Amended) The device of claim 16, wherein the decision 
component is further configured to compare the results of the coherence component and 
the variation component to threshold values and identify the sequence as a semantic unit 
based [[on]] at least in part on the comparisons. 
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26. (Original) The device of claim 16, further comprising: 

a heuristics component configured to apply one or more predefined rules to the 
sequence, wherein the decision component is further configured to determine whether the 
sequence constitutes a semantic unit based at least in part on application of the one or 
more rules. 

27. (Original) The device of claim 26, wherein the one or more rules are 
exclusionary rules that determine when certain sequences are not semantic units. 

28. (Currently Amended) A device comprising: 

means for calculating a first value representing a coherence of terms in a sequence 
of terms; 

means for calculating a second value representing variation of context in which 
the sequence occurs; [[and]] 

means for determining whether the sequence is a semantic unit based at least in 
part on the first and second values ; and 

means for outputting an indication of whether the sequence is a semantic unit . 

29. (Currently Amended) A computer-readable medium memory device that 
includes programming instructions configured to control at least one processor, the 
computer-readable medium memory device comprising: 
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instructions for calculating a first value representing a coherence of terms in a 

sequence of terms; 

instructions for calculating a second value representing variation of context in 
which the sequence occurs; [[and]] 

instructions for determining whether the sequence is a semantic unit based on the 
first and second values : and 

instructions for outputting an indication of whether the sequence is a semantic 

unit. 
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